Definition (Shattering) A hypothesis class H H H shatters a finite set C ⊂ X C \subset X C ⊂ X if the restriction of H H H to C C C is the set of all functions from C C C to { 0 , 1 } \{0, 1\} { 0 , 1 } . That is, ∣ H C ∣ = 2 ∣ C ∣ |H_C| = 2^{|C|} ∣ H C ∣ = 2 ∣ C ∣ .
Corollary Let H H H be a hypothesis class of functions X → { 0 , 1 } X \to \{0, 1\} X → { 0 , 1 } . Let m be a training set size. Assume that there exists a set C ⊂ X C \subset X C ⊂ X of size 2 m 2m 2 m that is shattered by H H H . Then, for any learning algorithm, A A A , there exist a distribution D D D over X × { 0 , 1 } X \times \{0, 1\} X × { 0 , 1 } and a predictor h ∈ H h \in H h ∈ H such that L D ( h ) = 0 L_D(h) = 0 L D ( h ) = 0 but with probability of at least 1 / 7 1/7 1/7 over the choice of S ∼ D m S \sim D^m S ∼ D m we have that L D ( A ( S ) ) ≥ 1 / 8 L_D(A(S)) \ge 1/8 L D ( A ( S )) ≥ 1/8
Philosophically, if someone can explain every phenomoenon, his explanations are worthless
Definition (VC Dimension) The VC-dimension of a hypothesis class H H H , denoted V C d i m ( H ) VCdim(H) V C d im ( H ) , is the maximal size of a set C ⊂ X C \subset X C ⊂ X that can be shattered by H H H . If H H H can shatter sets of arbitrarily large size we say that H H H has infinite VC-dimension.
Theorem Let H H H be a class of infinite VC-dimension. Then, H is not PAC learnable.
Proof Since H H H has in infinite VC-dimension, for any training set size m, there exists a shattered set of size 2m, and the claim follows by the previous corollary.
How to show V C d i m ( H ) = d VCdim(H) = d V C d im ( H ) = d ? There exists a set C C C of size d d d that is shattered by H H H . Every set C C C of size d + 1 d+1 d + 1 is not shattered by H H H . Example for finite classes For any finite hypothesis class H H H , we have V C d i m ( H ) ≤ log ∣ H ∣ VCdim(H) \le \log{|H|} V C d im ( H ) ≤ log ∣ H ∣ . This is because for any set C , ∣ H C ∣ ≤ ∣ H ∣ C, |H_C| \le |H| C , ∣ H C ∣ ≤ ∣ H ∣ . Therefore, if 2 ∣ C ∣ > ∣ H ∣ 2|C| > |H| 2∣ C ∣ > ∣ H ∣ , then we cannot shatter C C C .
Will consider adding some more examples.
Definition (Growth Function) Let H H H be a hypothesis class. Then the growth function of H H H , denoted τ H : N → N \tau_H : \mathbb{N} \to \mathbb{N} τ H : N → N , is defined as
τ H ( m ) = max C ⊂ X : ∣ C ∣ = m ∣ H C ∣ . \begin{gather*} \tau_H(m)=\max_{C \subset X:|C|=m} |H_C|. \end{gather*} τ H ( m ) = C ⊂ X : ∣ C ∣ = m max ∣ H C ∣. Sauer's Lemma Let H H H be a hypothesis class with V C d i m ( H ) ≤ d < ∞ VCdim(H) \le d < \infty V C d im ( H ) ≤ d < ∞ . Then, for all m , τ H ( m ) ≤ ∑ i = 0 d ( m i ) m, \tau_H(m) \le \sum_{i=0}^d {m \choose i} m , τ H ( m ) ≤ ∑ i = 0 d ( i m ) . In particular, if m > d + 1 m > d + 1 m > d + 1 then τ H ( m ) ≤ ( e m / d ) d \tau_H(m) \le (em/d)^d τ H ( m ) ≤ ( e m / d ) d
Proof ( k j ) = k ! j ! ( k − 1 ) ! = k ( k − 1 ) ⋯ ( k − j + 1 ) j ! = k d k − 1 d ⋯ k − j + 1 d d j j ! < ( k d ) j d j j ! ≤ ( k d ) d d j j ! \begin{align*} {k \choose j} & = {k! \over j!(k-1)!} \\ & = {k(k-1)\cdots (k-j+1) \over j!} \\ & = {k \over d}{k-1 \over d} \cdots {k-j+1 \over d}{d^j \over j!} \\ & < ({k \over d})^j {d^j \over j!} \\ & \le ({k \over d})^d {d^j \over j!} \end{align*} ( j k ) = j ! ( k − 1 )! k ! = j ! k ( k − 1 ) ⋯ ( k − j + 1 ) = d k d k − 1 ⋯ d k − j + 1 j ! d j < ( d k ) j j ! d j ≤ ( d k ) d j ! d j Hence,
∑ j = 0 d ( k j ) ≤ ∑ j = 0 d ( k d ) d d j j ! ≤ ( k d ) d ∑ j = 0 ∞ d j j ! = ( e k d ) d \begin{align*} \sum_{j=0}^d {k \choose j} & \le \sum_{j=0}^d ({k \over d})^d {d^j \over j!} \\ & \le ({k \over d})^d \sum_{j=0}^\infty {d^j \over j!} \\ & = ({ek \over d})^d \end{align*} j = 0 ∑ d ( j k ) ≤ j = 0 ∑ d ( d k ) d j ! d j ≤ ( d k ) d j = 0 ∑ ∞ j ! d j = ( d e k ) d Theorem Let H H H be a class and let τ H \tau_H τ H be its growth function. Then, for every D D D and every δ ∈ ( 0 , 1 ) \delta \in (0, 1) δ ∈ ( 0 , 1 ) , with probability of at least 1 − δ 1-\delta 1 − δ over the choice of S ∼ D m S \sim D^m S ∼ D m we have
∣ L D ( h ) − L S ( h ) ∣ ≤ 4 + log ( τ H ( 2 m ) ) δ 2 m \begin{gather*} |L_D(h) - L_S(h)| \le {4+\sqrt{\log{(\tau_H(2m))}} \over \delta\sqrt{2m}} \end{gather*} ∣ L D ( h ) − L S ( h ) ∣ ≤ δ 2 m 4 + log ( τ H ( 2 m )) For simplicity assume that d log ( 2 e m / d ) ≥ 4 \sqrt{d\log{(2em/d)}} \ge 4 d log ( 2 e m / d ) ≥ 4 ; hence,
∣ L D ( h ) − L S ( h ) ∣ ≤ 1 δ 2 d log ( 2 e m / d ) m \begin{gather*} |L_D(h) - L_S(h)| \le {1 \over \delta} \sqrt{{2d\log{(2em/d)} \over m}} \end{gather*} ∣ L D ( h ) − L S ( h ) ∣ ≤ δ 1 m 2 d log ( 2 e m / d ) Proof We will start by showing that
E S ∼ D m [ sup h ∈ H ∣ L D ( h ) − L S ( h ) ∣ ] ≤ 4 + log ( τ H ( 2 m ) ) 2 m . \underset{S \sim \mathcal{D}^m}{\mathbb{E}}\left[\sup _{h \in \mathcal{H}}\left|L_{\mathcal{D}}(h)-L_S(h)\right|\right] \leq \frac{4+\sqrt{\log \left(\tau_{\mathcal{H}}(2 m)\right)}}{\sqrt{2 m}} . S ∼ D m E [ h ∈ H sup ∣ L D ( h ) − L S ( h ) ∣ ] ≤ 2 m 4 + log ( τ H ( 2 m ) ) . Since the random variable sup h ∈ H ∣ L D ( h ) − L S ( h ) ∣ \sup _{h \in \mathcal{H}}\left|L_{\mathcal{D}}(h)-L_S(h)\right| sup h ∈ H ∣ L D ( h ) − L S ( h ) ∣ is nonnegative, the proof of the theorem follows directly from the preceding using Markov's inequality.
To bound the left-hand side of the previous equation we first note that for every h ∈ H h \in \mathcal{H} h ∈ H , we can rewrite L D ( h ) = E S ′ ∼ D m [ L S ′ ( h ) ] L_{\mathcal{D}}(h)=\mathbb{E}_{S^{\prime} \sim \mathcal{D}^m}\left[L_{S^{\prime}}(h)\right] L D ( h ) = E S ′ ∼ D m [ L S ′ ( h ) ] , where S ′ = z 1 ′ , … , z m ′ S^{\prime}=z_1^{\prime}, \ldots, z_m^{\prime} S ′ = z 1 ′ , … , z m ′ is an additional i.i.d. sample. Therefore,
E S ∼ D m [ sup h ∈ H ∣ L D ( h ) − L S ( h ) ∣ ] = E S ∼ D m [ sup h ∈ H ∣ E S ′ ∼ D m L S ′ ( h ) − L S ( h ) ∣ ] . \underset{S \sim \mathcal{D}^m}{\mathbb{E}}\left[\sup _{h \in \mathcal{H}}\left|L_{\mathcal{D}}(h)-L_S(h)\right|\right]=\underset{S \sim \mathcal{D}^m}{\mathbb{E}}\left[\sup _{h \in \mathcal{H}}\left|\underset{S^{\prime} \sim \mathcal{D}^m}{\mathbb{E}} L_{S^{\prime}}(h)-L_S(h)\right|\right] . S ∼ D m E [ h ∈ H sup ∣ L D ( h ) − L S ( h ) ∣ ] = S ∼ D m E [ h ∈ H sup ∣ ∣ S ′ ∼ D m E L S ′ ( h ) − L S ( h ) ∣ ∣ ] . A generalization of the triangle inequality yields
∣ E S ′ ∼ D m [ L S ′ ( h ) − L S ( h ) ] ∣ ≤ E S ′ ∼ D m ∣ L S ′ ( h ) − L S ( h ) ∣ , \left|\underset{S^{\prime} \sim \mathcal{D}^m}{\mathbb{E}}\left[L_{S^{\prime}}(h)-L_S(h)\right]\right| \leq \underset{S^{\prime} \sim \mathcal{D}^m}{\mathbb{E}}\left|L_{S^{\prime}}(h)-L_S(h)\right|, ∣ ∣ S ′ ∼ D m E [ L S ′ ( h ) − L S ( h ) ] ∣ ∣ ≤ S ′ ∼ D m E ∣ L S ′ ( h ) − L S ( h ) ∣ , and the fact that supermum of expectation is smaller than expectation of supremum yields
sup h ∈ H E S ′ ∼ D m ∣ L S ′ ( h ) − L S ( h ) ∣ ≤ E S ′ ∼ D m sup h ∈ H ∣ L S ′ ( h ) − L S ( h ) ∣ . \sup _{h \in \mathcal{H}} \underset{S^{\prime} \sim \mathcal{D}^m}{\mathbb{E}}\left|L_{S^{\prime}}(h)-L_S(h)\right| \leq \underset{S^{\prime} \sim \mathcal{D}^m}{\mathbb{E}} \sup _{h \in \mathcal{H}}\left|L_{S^{\prime}}(h)-L_S(h)\right| . h ∈ H sup S ′ ∼ D m E ∣ L S ′ ( h ) − L S ( h ) ∣ ≤ S ′ ∼ D m E h ∈ H sup ∣ L S ′ ( h ) − L S ( h ) ∣ . Formally, the previous two inequalities follow from Jensen's inequality. Combining all we obtain
E S ∼ D m [ sup h ∈ H ∣ L D ( h ) − L S ( h ) ∣ ] ≤ E S , S ′ ∼ D m [ sup h ∈ H ∣ L S ′ ( h ) − L S ( h ) ∣ ] = E S , S ′ ∼ D m [ sup h ∈ H 1 m ∣ ∑ i = 1 m ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) ∣ ] . (1) \begin{aligned} \underset{S \sim \mathcal{D}^m}{\mathbb{E}}\left[\sup _{h \in \mathcal{H}}\left|L_{\mathcal{D}}(h)-L_S(h)\right|\right] & \leq \underset{S, S^{\prime} \sim \mathcal{D}^m}{\mathbb{E}}\left[\sup _{h \in \mathcal{H}}\left|L_{S^{\prime}}(h)-L_S(h)\right|\right] \\ &=\underset{S, S^{\prime} \sim \mathcal{D}^m}{\mathbb{E}}\left[\sup _{h \in \mathcal{H}} \frac{1}{m}\left|\sum_{i=1}^m\left(\ell\left(h, z_i^{\prime}\right)-\ell\left(h, z_i\right)\right)\right|\right] . \tag{1} \end{aligned} S ∼ D m E [ h ∈ H sup ∣ L D ( h ) − L S ( h ) ∣ ] ≤ S , S ′ ∼ D m E [ h ∈ H sup ∣ L S ′ ( h ) − L S ( h ) ∣ ] = S , S ′ ∼ D m E [ h ∈ H sup m 1 ∣ ∣ i = 1 ∑ m ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) ∣ ∣ ] . ( 1 ) The expectation on the right-hand side is over a choice of two i.i.d. samples S = z 1 , … , z m S=z_1, \ldots, z_m S = z 1 , … , z m and S ′ = z 1 ′ , … , z m ′ S^{\prime}=z_1^{\prime}, \ldots, z_m^{\prime} S ′ = z 1 ′ , … , z m ′ . Since all of these 2 m 2 m 2 m vectors are chosen i.i.d., nothing will change if we replace the name of the random vector z i z_i z i with the name of the random vector z i ′ z_i^{\prime} z i ′ . If we do it, instead of the term ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) \left(\ell\left(h, z_i^{\prime}\right)-\ell\left(h, z_i\right)\right) ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) in Equation (1) we will have the term − ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) -\left(\ell\left(h, z_i^{\prime}\right)-\ell\left(h, z_i\right)\right) − ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) . It follows that for every σ ∈ { ± 1 } m \boldsymbol{\sigma} \in\{\pm 1\}^m σ ∈ { ± 1 } m we have that Equation (1) equals
E S , S ′ ∼ D m [ sup h ∈ H 1 m ∣ ∑ i = 1 m σ i ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) ∣ ] \underset{S, S^{\prime} \sim \mathcal{D}^m}{\mathbb{E}}\left[\sup _{h \in \mathcal{H}} \frac{1}{m}\left|\sum_{i=1}^m \sigma_i\left(\ell\left(h, z_i^{\prime}\right)-\ell\left(h, z_i\right)\right)\right|\right] S , S ′ ∼ D m E [ h ∈ H sup m 1 ∣ ∣ i = 1 ∑ m σ i ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) ∣ ∣ ] Since this holds for every σ ∈ { ± 1 } m \boldsymbol{\sigma} \in\{\pm 1\}^m σ ∈ { ± 1 } m , it also holds if we sample each component of σ \boldsymbol{\sigma} σ uniformly at random from the uniform distribution over { ± 1 } \{\pm 1\} { ± 1 } , denoted U ± U_{\pm} U ± . Hence, Equation (1) also equals
E σ ∼ U ± m E S , S ′ ∼ D m [ sup h ∈ H 1 m ∣ ∑ i = 1 m σ i ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) ∣ ] , \underset{\sigma \sim U_{\pm}^m}{\mathbb{E}} \underset{S, S^{\prime} \sim \mathcal{D}^m}{\mathbb{E}}\left[\sup _{h \in \mathcal{H}} \frac{1}{m}\left|\sum_{i=1}^m \sigma_i\left(\ell\left(h, z_i^{\prime}\right)-\ell\left(h, z_i\right)\right)\right|\right], σ ∼ U ± m E S , S ′ ∼ D m E [ h ∈ H sup m 1 ∣ ∣ i = 1 ∑ m σ i ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) ∣ ∣ ] , and by the linearity of expectation it also equals
E S , S ′ ∼ D m E σ ∼ U ± m [ sup h ∈ H 1 m ∣ ∑ i = 1 m σ i ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) ∣ ] . \underset{S, S^{\prime} \sim \mathcal{D}^m}{\mathbb{E}} \underset{\boldsymbol{\sigma} \sim U_{\pm}^m}{\mathbb{E}}\left[\sup _{h \in \mathcal{H}} \frac{1}{m}\left|\sum_{i=1}^m \sigma_i\left(\ell\left(h, z_i^{\prime}\right)-\ell\left(h, z_i\right)\right)\right|\right] . S , S ′ ∼ D m E σ ∼ U ± m E [ h ∈ H sup m 1 ∣ ∣ i = 1 ∑ m σ i ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) ∣ ∣ ] . Next, fix S S S and S ′ S^{\prime} S ′ , and let C C C be the instances appearing in S S S and S ′ S^{\prime} S ′ . Then, we can take the supremum only over h ∈ H C h \in \mathcal{H}_C h ∈ H C . Therefore,
E σ ∼ U ± m [ sup h ∈ H 1 m ∣ ∑ i = 1 m σ i ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) ∣ ] = E σ ∼ U ± m [ max h ∈ H C 1 m ∣ ∑ i = 1 m σ i ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) ∣ ] . \begin{aligned} &\underset{\boldsymbol{\sigma} \sim U_{\pm}^m}{\mathbb{E}}\left[\sup _{h \in \mathcal{H}} \frac{1}{m}\left|\sum_{i=1}^m \sigma_i\left(\ell\left(h, z_i^{\prime}\right)-\ell\left(h, z_i\right)\right)\right|\right] \\ &=\underset{\sigma \sim U_{\pm}^m}{\mathbb{E}}\left[\max _{h \in \mathcal{H}_C} \frac{1}{m}\left|\sum_{i=1}^m \sigma_i\left(\ell\left(h, z_i^{\prime}\right)-\ell\left(h, z_i\right)\right)\right|\right] . \end{aligned} σ ∼ U ± m E [ h ∈ H sup m 1 ∣ ∣ i = 1 ∑ m σ i ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) ∣ ∣ ] = σ ∼ U ± m E [ h ∈ H C max m 1 ∣ ∣ i = 1 ∑ m σ i ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) ∣ ∣ ] . Fix some h ∈ H C h \in \mathcal{H}_C h ∈ H C and denote θ h = 1 m ∑ i = 1 m σ i ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) \theta_h=\frac{1}{m} \sum_{i=1}^m \sigma_i\left(\ell\left(h, z_i^{\prime}\right)-\ell\left(h, z_i\right)\right) θ h = m 1 ∑ i = 1 m σ i ( ℓ ( h , z i ′ ) − ℓ ( h , z i ) ) . Since E [ θ h ] = 0 \mathbb{E}\left[\theta_h\right]=0 E [ θ h ] = 0 and θ h \theta_h θ h is an average of independent variables, each of which takes values in [ − 1 , 1 ] [-1,1] [ − 1 , 1 ] , we have by Hoeffding's inequality that for every ρ > 0 \rho>0 ρ > 0 ,
P [ ∣ θ h ∣ > ρ ] ≤ 2 exp ( − 2 m ρ 2 ) . \mathbb{P}\left[\left|\theta_h\right|>\rho\right] \leq 2 \exp \left(-2 m \rho^2\right) . P [ ∣ θ h ∣ > ρ ] ≤ 2 exp ( − 2 m ρ 2 ) . Applying the union bound over h ∈ H C h \in \mathcal{H}_C h ∈ H C , we obtain that for any ρ > 0 \rho>0 ρ > 0 ,
P [ max h ∈ H C ∣ θ h ∣ > ρ ] ≤ 2 ∣ H C ∣ exp ( − 2 m ρ 2 ) . \mathbb{P}\left[\max _{h \in \mathcal{H}_C}\left|\theta_h\right|>\rho\right] \leq 2\left|\mathcal{H}_C\right| \exp \left(-2 m \rho^2\right) . P [ h ∈ H C max ∣ θ h ∣ > ρ ] ≤ 2 ∣ H C ∣ exp ( − 2 m ρ 2 ) . Finally, the Lemma below tells us that the preceding implies
E [ max h ∈ H C ∣ θ h ∣ ] ≤ 4 + log ( ∣ H C ∣ ) 2 m . \mathbb{E}\left[\max _{h \in \mathcal{H}_C}\left|\theta_h\right|\right] \leq \frac{4+\sqrt{\log \left(\left|\mathcal{H}_C\right|\right)}}{\sqrt{2 m}} . E [ h ∈ H C max ∣ θ h ∣ ] ≤ 2 m 4 + log ( ∣ H C ∣ ) . Combining all with the definition of τ H \tau_{\mathcal{H}} τ H , we have shown that
E S ∼ D m [ sup h ∈ H ∣ L D ( h ) − L S ( h ) ∣ ] ≤ 4 + log ( τ H ( 2 m ) ) 2 m . \underset{S \sim \mathcal{D}^m}{\mathbb{E}}\left[\sup _{h \in \mathcal{H}}\left|L_{\mathcal{D}}(h)-L_S(h)\right|\right] \leq \frac{4+\sqrt{\log \left(\tau_{\mathcal{H}}(2 m)\right)}}{\sqrt{2 m}} . S ∼ D m E [ h ∈ H sup ∣ L D ( h ) − L S ( h ) ∣ ] ≤ 2 m 4 + log ( τ H ( 2 m ) ) . Lemma Let X X X be a random variable and x ′ ∈ R x^{\prime} \in \mathbb{R} x ′ ∈ R be a scalar and assume that there exists a > 0 a>0 a > 0 and b ≥ e b \geq e b ≥ e such that for all t ≥ 0 t \geq 0 t ≥ 0 we have P [ ∣ X − x ′ ∣ > \mathbb{P}\left[\left|X-x^{\prime}\right|>\right. P [ ∣ X − x ′ ∣ > t ] ≤ 2 b e − t 2 / a 2 t] \leq 2 b e^{-t^2 / a^2} t ] ≤ 2 b e − t 2 / a 2 . Then, E [ ∣ X − x ′ ∣ ] ≤ a ( 2 + log ( b ) ) \mathbb{E}\left[\left|X-x^{\prime}\right|\right] \leq a(2+\sqrt{\log (b)}) E [ ∣ X − x ′ ∣ ] ≤ a ( 2 + log ( b ) )
Proof For all i = 0 , 1 , 2 , … i=0,1,2, \ldots i = 0 , 1 , 2 , … denote t i = a ( i + log ( b ) ) t_i=a(i+\sqrt{\log (b)}) t i = a ( i + log ( b ) ) . Since t i t_i t i is monotonically increasing we have that
E [ ∣ X − x ′ ∣ ] ≤ a log ( b ) + ∑ i = 1 ∞ t i P [ ∣ X − x ′ ∣ > t i − 1 ] \mathbb{E}\left[\left|X-x^{\prime}\right|\right] \leq a \sqrt{\log (b)}+\sum_{i=1}^{\infty} t_i \mathbb{P}\left[\left|X-x^{\prime}\right|>t_{i-1}\right] E [ ∣ X − x ′ ∣ ] ≤ a log ( b ) + i = 1 ∑ ∞ t i P [ ∣ X − x ′ ∣ > t i − 1 ] Using the assumption in the lemma we have
∑ i = 1 ∞ t i P [ ∣ X − x ′ ∣ > t i − 1 ] ≤ 2 a b ∑ i = 1 ∞ ( i + log ( b ) ) e − ( i − 1 + log ( b ) ) 2 ≤ 2 a b ∫ 1 + log ( b ) ∞ x e − ( x − 1 ) 2 d x = 2 a b ∫ log ( b ) ∞ ( y + 1 ) e − y 2 d y ≤ 4 a b ∫ log ( b ) ∞ y e − y 2 d y = 2 a b [ − e − y 2 ] log ( b ) ∞ = 2 a b / b = 2 a . \begin{array}{rl} \sum_{i=1}^{\infty} t_i & \mathbb{P}\left[\left|X-x^{\prime}\right|>t_{i-1}\right] \leq 2 a b \sum_{i=1}^{\infty}(i+\sqrt{\log (b)}) e^{-(i-1+\sqrt{\log (b)})^2} \\ & \leq 2 a b \int_{1+\sqrt{\log (b)}}^{\infty} x e^{-(x-1)^2} d x \\ & =2 a b \int_{\sqrt{\log (b)}}^{\infty}(y+1) e^{-y^2} d y \\ & \leq 4 a b \int_{\sqrt{\log (b)}}^{\infty} y e^{-y^2} d y \\ & =2 a b\left[-e^{-y^2}\right]_{\sqrt{\log (b)}}^{\infty} \\ & =2 a b / b=2 a . \end{array} ∑ i = 1 ∞ t i P [ ∣ X − x ′ ∣ > t i − 1 ] ≤ 2 ab ∑ i = 1 ∞ ( i + log ( b ) ) e − ( i − 1 + l o g ( b ) ) 2 ≤ 2 ab ∫ 1 + l o g ( b ) ∞ x e − ( x − 1 ) 2 d x = 2 ab ∫ l o g ( b ) ∞ ( y + 1 ) e − y 2 d y ≤ 4 ab ∫ l o g ( b ) ∞ y e − y 2 d y = 2 ab [ − e − y 2 ] l o g ( b ) ∞ = 2 ab / b = 2 a . Combining the preceding inequalities we conclude our proof.